Gaijin: A Bootstrapping, Template- Driven Approach to Example-Based MT

نویسندگان

  • Tony Veale
  • Andy Way
چکیده

Example-based Machine Translation (EBMT) is a recent approach to MT that offers robustness, scalability and graceful degradation, deriving as it does its competence not from explicit linguistic models of source and target languages, but from the wealth of bilingual corpora that are now available. Gaijin is such a system, employing statistical methods, string-matching, case-based reasoning and template-matching to provide a linguistics-lite EBMT solution. The only linguistics employed by Gaijin is a psycholinguistic constraint—the marker hypothesis—that is minimal, simple to apply, and arguably universal. The scope and current state of Gaijin is described, and some initial evaluation results are reported.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Iterative, MT-based Sentence Alignment of Parallel Texts

Recent research has shown that MT-based sentence alignment is a robust approach for noisy parallel texts. However, using Machine Translation for sentence alignment causes a chicken-and-egg problem: to train a corpus-based MT system, we need sentence-aligned data, and MT-based sentence alignment depends on an MT system. We describe a bootstrapping approach to sentence alignment that resolves thi...

متن کامل

A survey of Data Driven Machine Translation

Machine Translation (MT) refers to the use of computers for translating automatically from one language to another. The differences between source and target languages and the inherent ambiguity of the source language itself make MT a very difficult problem. Traditional approaches to MT have relied on humans giving linguistic knowledge in the form of rules to transform text. Given the vastness ...

متن کامل

Coping with Data-sparsity in Example-based Machine Translation

Data-driven Machine Translation (MT) systems have been found to require large amounts of data to function well. However, obtaining parallel texts for many languages is time-consuming, expensive and difficult. This thesis aims at improving translation quality for languages that have limited resources by making use of the available data more efficiently. Templates or generalizations of sentence-p...

متن کامل

Hybrid Strategies for better products and shorter time-to-market

The main Lingenio MT products are based on rule-based architectures. In the presentation we show how knowledge from corpora is integrated into the systems using the language analysisand translation-components in a bootstrapping approach. This relates to the bilingual dictionaries, but also to learning decisions concerning the selection of syntactic rules and semantic readings in parsing and sem...

متن کامل

Electrophoretic Synthesis of Titanium Oxide Nanotubes

In the current research project, sol-gel electrophoresis technique was utilized to grow titanium dioxide (TiO2) nanotubes. A titanium sol was prepared using organometallic precursors of titanium to fill the template channels. The prepared solwas driven into nanopores of porous anodic aluminum oxide templates under the influence of a DC electric field to form nanotubes on the pore walls. Tube fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997